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[57] ABSTRACT 

A voice-controlled device of this invention includes a 
receiving/discriminating mechanism for receiving a 
DTMF signal externally input through a telephone line 
and discriminating the DTMF signal, a level detecting 
mechanism for detecting a signal level of the DTMF 
signal input to the receiving/discriminating mechanism, 
a speech recognizer for recognizing the content of a 
voice signal externally input through the telephone line, 
an input level adjustor for adjusting a signal level of the 
voice signal input to the speech recognizer on the basis 
of the signal level detection result by the level detecting 
mechanism, and a function executor for executing a 
function according to the content of the voice signal 
recognized by the speech recognizer. 

15 Claims, 3 Drawing Sheets 
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state of a telephone line, disturbing accurate speech 

VOICE-CONTROLLED APPARATUS USING recognition, and to provide a voice-controlled appara- 

TELEPHONE AND VOICE-CONTROL METHOD tus which can maintain a proper input level to the 

speech recognition means, so that highly accurate 

This application is a continuation of application Ser. 5 speech recognition can be achieved. 

No. 07/487,445, filed on Mar. 2, 1990, now abandoned. a voice-controlled apparatus according to the pres- 

BACKGROUND OF THE INVENTION ent ^v^* 011 comprises: receiving/discriminating 

means for receiving and discriminating a DTMF signal 

1. Field of the Invention externally input through a telephone line; level detect- 
The present invention relates to a voice-controlled 10 ing mean s for detecting the signal level of the DTMF 

apparatus such as a so-called automatic answering tele- signal mput to the receiving/discriminating means; 

phone having an automatic answering function. speech recognition means for recognizing the content 

2. Description of the Related Art of a voice signal externally input through the telephone 
Some conventional automatic answering telephones line; input leve] adjusting mea ns for adjusting the signal 

have a function for receiving a push phone dial signal. 15 ]evel of the voice si d m 1 10 the s h recognition 

With a telephone of this type, a user can make a tele- mea ns, on the basis of the signal level detection result of 

phone call to this telephone from an outdoor telephone the level detecting mean> md function executing means 

to hear a recorded message. In this case, the user inputs for executing a function accor ding to the content of the 

an ID number (identification data) by depressing the voice si ^ Teco nized b the h recognition 

push buttons on the outdoor telephone. 20 means 

The ID number is converted to a DTMF (Dual Tone ,„ ^ ^ arra „gement, an initially input DTMF 

Mult. Frequency) signal used as the push phone dial ^ . discriminated, and its reception leVel is mea- 

signal, and reaches the automatic telephone through the j -n_ - . i i r*i_ t_ 

. v u r i. •* * j j wt. iL j j sured. The input level of the speech recognition means 

telephone line, where it is encoded, when the encoded „ , , r , . - ? it _ , 

rm^r «i -a — u j • is adjusted to a proper value on the basis of the detected 

DTMF signal coincides with a number registered in 25 -\ . . , — . ~ - . 

advance in the automatic answering telephone, the user rece P n ° n leVeL T? 6 ™^ * ^ * ^ l ° * reC ° S ~ 

can hear the recorded content. n,2 c ed 15 'TU^ Spe 5° h r f 0gmtl r on meanS ;, . - 

In order to hear the recorded content, push buttons Sl f^ a DTMF SI « n l al be ^sfactonly discnmi- 

are depressed to designate a predetermined operation. nated by existing techniques an ID number can be 

For example, "I" designates a rewind operation; -2", a 30 discriminated. When the reception level of the 

fast-forward operation; "3", a playback operation; and DTMF ^i de l ected simultaneously with discnmi- 

"4", a stop operation. The automatic answering tele- natlon of tne DTMF a* 1 * the de * ree ofloss caused b V 

phone receives these DTMF signals and executes the the telephone line can be measured. Thus, the subse- 

designated function. quent input level of the speech recognition means can 

However, since the above operation functions are 35 Properly maintained by compensating for the mea- 

encoded by numerals,, the user may tend to forget the sured loss * As a result > s P eech recognition can always be 

correspondence between the operations and codes. hl S hlv accurately achieved. 

Thus, if each operation is designated by a voice or Additional objects and advantages of the invention 

voice input (e.g., a word "rewind", "fast-forward", will be set forth in the description which follows, and in 

"playback", or "stop"), and the designated operation is 40 P art wil1 be obvious from the description, or may be 

recognized by the automatic answering telephone, the learned by practice of the invention. The objects and 

user can easily actuate the above operations. advantages of the invention may be realized and ob- 

In order to execute speech recognition with existing tained by means of the instrumentalities and combina- 

techniques, a voice level input to a speech recognition tions particularly pointed out in the appended claims, 

means must be appropriate. However, the level of a 45 brief DESCRIPTION OF THE DRAWINGS 
voice signal externally mput through a telephone line 

varies considerably. Causes of the variation are mainly Th e accompanying drawings, which are incorpo- 

present in the state of the telephone line. More specifi- rated m and constitute a part of the specification, illus- 

cally, when the telephone line suffers from a large signal trate presently preferred embodiments of the invention, 

transmission loss, the level of the voice signal reaching 50 and together with the general description given above 

the speech recognition means is decreased. However, and the detailed description of the preferred embodi- 

when a user makes a local telephone call, the transmis- ments given below, serve to explain the principles of the 

sion loss on the telephone line is very small, therefore a invention. 

high level voice signal can be input to the speech recog- FIG. 1 is a block diagram showing a voice- o con- 

nition means. 55 trolled apparatus according to an embodiment of the 

More specifically, when the telephone line is con- present invention; 

nected to the speech recognition means, the input level FIG. 2 shows details of a programmable gain ampli- 

of a voice signal varies considerably depending on the far used in the embodiment of FIG. 1; 

state of the telephone line between the calling party and FIG. 3 shows a waveform diagram for explaining an 

the speech recognition means. Therefore, this state is 60 operation for detecting the level of a DTMF signal; and 

very disadvantageous for a speech recognition appara- FIG. 4 is a flow chart for explaining an operation of 

tus, and accurate speech recognition may not always be the embodiment of FIG. 1. 
attained. 

DETAILED DESCRIPTION OF THE 

SUMMARY OF THE INVENTION 65 PREFERRED EMBODIMENTS 

It is therefore an object of the present invention to A voice-controlled apparatus according to an em- 
solve the problem of the level of a voice input to a bodiment of the present invention will be described 
speech recognition means varying depending on the below with reference to the accompanying drawings. 
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In FIG. 1, reference numeral 1 denotes a telephone at 
a distant place; and 2, a telephone line. Reference nu- 
meral 3 denotes a telephone with a network control unit 
4 (to be referred to as an NCU hereinafter). Telephone 
3 has a function of receiving a voice signal input from 5 
telephone line 2 or sending a voice signal out on tele- 
phone line 2. 

Reference numeral 5 denotes a DTMF signal re- 
ceiver for receiving and encoding a DTMF signal. Ref- 
erence numeral 6 denotes a level detector for detecting 10 
the reception level of a DTMF signal input through 
NCU 4. Reference numeral 7 denotes a speech re- 
sponder having a conventional voice synthesizing cir- 
cuit and a circuit for driving the synthesizing circuit. 
Speech responder 7 sends a voice message to the calling 15 
party. Reference numeral 8 denotes a PGA (program- 
mable gain control amplifier) for controlling the magni- 
tude of a voice level input to speech recognition unit 9 
(to be described later). Reference numeral 9 denotes the 
speech recognition unit for recognizing a voice signal 20 
(word voice signal) input from telephone line 2 through 
NCU 4. Reference numeral 10 denotes a tape recorder 
(automatic answering means) which is controlled by 
controller 11 (to be described later). Reference numeral 
11 denotes the controller for controlling the entire ap- 25 
paratus shown in FIG. 1. 

Various speech recognition techniques may be exe- 
cuted by speech recognition unit 9. For example, a 
technique disclosed in the following reference: 

Yoichi TAKEB A Y A SHI et al., "TELEPHONE 30 
SPEECH RECOGNITION USING A HYBRID 
METHOD", IEEE 7th International Conference on 
Pattern Recognition Proceedings, (Jul. 30-Aug. 2, 
1984), Montreal, Canada, pp. 1232-1235. 

The present specification incorporates the disclosure 35 
of the above reference. 

FIG. 2 shows details of PGA 8 used in the embodi- 
ment shown in FIG. 1. 

An input DTMF signal is input to the inverted input 
terminal of operational amplifier 80 through resistor Ra. 
The output from operational amplifier 80 is negatively 
fed back to the inverted input terminal of operational 
amplifier 80 through feedback resistors RNF and Rb. 

Feedback resistors RNF include n series-connected 
resistors Rl to Rn. Resistors Rl to Rn are connected in 
parallel with electronic switches SI to Sn. In this case, 
gain G of the entire circuit shown in FIG. 2 is given by: 



40 



G~-(Rb + RNf)/Ra 



Feedback resistors RNF are expressed by: 



RNF>=R1 + R2+... +Rn 



0) 



50 



(2) 



Switches SI to Sn are turned on/off in response to n 55 
outputs from switch driver 82. The output from switch 
driver 82 is determined on the basis of a gain control 
instruction from controller 11. 

Assume a simple example wherein the gain control 
instruction is 2-bit binary data [x,y], resistors RNF in- 60 
elude only two resistors Rl and R2, and the number of 
switches Sn is 2 (SI and S2). When [x,y] = [0,0], since 
both switches SI and S2 are set ON, RNF=0. When 
[x,y] = [0,l] l since switch SI is set OFF and switch S2 is 
set ON, RNF=R1. When [x,y] = [l,0], since switch SI 65 
is set ON and switch S2 is set OFF, RNF=R2. When 
[x,y] = [U], since both switches SI and S2 are set OFF, 
RNF=R1+R2. In this manner, since resistors RNF 



can have four resistances, PGA 8 can have one of four 
gains according to equation (1). 

As described above, the gain of PGA 8 is controlled 
on the basis of the gain control instruction from control- 
ler 11. 

Note that PGA 8 may be arranged as follows. That is, 
four amplifiers having different fixed gains are pre- 
pared, and a DTMF signal is input to each of these 
amplifiers. A DTMF signal, amplified with a predeter- 
mined fixed gain selected based on gain control instruc- 
tion, is extracted from one of the amplifiers. 

The operation of the arrangement shown in FIG. 1 
will be described below, with reference to the wave- 
form diagram of FIG. 3 and the flow chart of FIG. 4. A 
user makes a telephone call to telephone using outside 
telephone 1 (ST10). NCU 4 provided in telephone 3 is 
enabled to interrupt a CPU in controller 11 (ST11). 
Thus, controller 11 enables speech responder 7 to send, 
to telephone 1 through telephone line 2, a message such 
as "I (owner) am out. Please leave a message on the 
recorder, or please input ID number." (ST12). When 
the user depresses the push buttons of telephone 1 to 
input an ID number (ST13), level detector 6 detects the 
signal level of the DTMF signal (left waveform in FIG. 
3) of the ID number (ST14). Parallel to the level detec- 
tion, DTMF signal receiver 5 discriminates the content 
of the received DTMF signal, and sends the discrimina- 
tion result to controller 11 (ST15). 

The CPU in controller 11 compares the sent discrimi- 
nation result (the ID number input by the user) with ID 
number table 11a registered in its own memory. If the 
discrimination result indicates non-registration (NO in 
step ST16), controller 11 enables speech responder 7 to 
send a message such as "This ID number is invalid." 
(ST17), and subsequently cancels the telephone call 
(ST18). When the telephone call is cancelled, the opera- 
tion of the apparatus shown in FIG. 1 is completed. 

If the sent discrimination result indicates registration 
of the ID number (YES in step ST16), the controller 11 
sends a message such as "Please send an operation com- 
mand." (ST19). 

After the operation command message is sent, the 
CPU in controller 11 controls the gain of PGA 8 on the 
basis of the reception level of the DTMF signal de- 
tected in step ST14 (ST20). More specifically, when the 
reception level of the DTMF signal is low, the gain of 
PGA 8 is increased; otherwise, the gain of PGA 8 is 
decreased, so that the input signal level of speech recog- 
nition unit 9 always falls within a predetermined range 
(acceptable range in FIG. 3). After the gain control, the 
CPU in controller 11 enables speech recognition unit 9 
(ST21). 

The operations in steps ST20 and ST21 are executed 
within a very short period of time until the user sends 
the subsequent operation command. 

The user sends a word (operation command) indicat- 
ing a desired operation (e.g., RECORD, PLAYBACK, 
STOP, PAUSE, REWIND, FAST FORWARD, etc.) 
from telephone 1 in response to the operation command 
request message in step ST19 (ST22). Speech recogni- 
tion unit 9 recognizes the sent voice operation com- 
mand, and inputs the recognition result to controller 11 
(ST23). The CPU in controller 11 sends an operation 
instruction corresponding to the input recognition re- 
sult (e.g., RECORD) to tape recorder 10 (ST24). Tape 
recorder 10 executes an operation (RECORD) accord- 
ing to the sent operation command (ST25). 
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The operations in steps ST22 to ST25 continue as If speech recognition unit 9 employs an unspecified 
long as the telephone is connected (NO in step ST26). If speaker word recognition system (see the above IEEE 
the telephone is hung up (YES in step ST26), the opera- reference "Telephone Speech Recognition Using a Hy- 
tion of the apparatus shown in FIG. 1 is completed. brid Method"), the sound of any word can be recog- 

The user of telephone 1 need only speak a message 5 nized, resulting in an ideal apparatus. However, when 
content when he or she wants to record a message, and only one user uses the automatic answering telephone, a 
the voice content is recorded on tape recorder 10. specified speaker word recognition system, in which 
When the message is over and telephone 10 is hung up, user's voice pattern data is registered in a RAM dictio- 
all the processing operations are completed, and tele- nary (not shown) to allow only the user to use the appa- 
phone 3 returns to its initial state. 10 ratus, may be employed. 

The above operations can be summarized as follows. Controller 11 sends a command corresponding to an 
When the user wants to hear the recorded content using operation to be instructed to logic-controlled type tape 
outside telephone 1, he depresses the push buttons of recorder 10 on the basis of the recognition result from 
telephone 1 to input his ID number. For example, if the speech recognition unit 9. Thus, tape recorder 10 exe- 
ID number is a four-digit number "1234", buttons "1", 15 cutes the instructed operation. This operation includes 
"2", "3", and "4" are depressed in turn. mechanical operations. 

The input ID number is converted to a DTMF signal, In recording, when controller 11 sends a correspond- 
and the DTMF signal is input to DTMF signal receiver ing command to tape recorder 10, tape recorder 10 
5 through telephone line 2 and NCU 4 of telephone 3. executes a recording operation. A voice signal input 
DTMF signal receiver 5 discriminates the ID number 20 from telephone line 2 is recorded by tape recorder 10. 
from the input DTMF signal, and sends the discrimina- On the other hand, the voice signal is also input to 
tion result to controller 11. speech recognition unit 9 through PGA 8. Speech rec- 

When NCU 4 is enabled to interrupt controller 11, ognition unit 9 monitors a voice OFF time using timer 
controller 11 enables level detector 6. Level detector 6 9a which is enabled when the level of an input voice 
detects positive peak value C of the input DTMF signal 25 envelope decreases below a predetermined level. When 
(right waveform in FIG. 3) during a time period en- timer 9a detects that no voice signal is input over a 
abled by controller 11 (time period from measurement predetermined period of time, signal e9a, indicating 
start up to completion of measurement), and sends the this, is generated to inform controller 11 of the rest 
detection result to controller 11. state. In this case, controller 11 determines this state as 

When controller 11 determines that DTMF signal 30 a state of waiting for the next operation command, and 
receiver 5 detects a DTMF signal of the predetermined sends a recognition operation start enable command to 
number of digits, controller 11 sends a measurement end speech recognition unit 9. 

instruction to level detector 6. Thereafter, the voice signal input from telephone line 

When a telephone call is made from telephone 1, tape 2 is recognized by speech recognition unit 9, and the 
recorder 10 starts a recording operation. When DTMF 35 recognition result is sent to controller 11. Controller 11 
signal receiver 5 detects that a DTMF signal is input, then sends an operation command to tape recorder 10 to 
controller 11 immediately sends a pause command to execute the predetermined operation, which in this case 
tape recorder 10 to pause the recording operation. is, e.g., a pause operation. 

Controller 11 checks the discriminated ID number to Similarly, when a fast-forward or rewind command is 
determine if the ID number is valid. If the ID number is 40 input from outside telephone 1, and a playback corn- 
invalid, controller 11 enables speech responder 7 to mand is sent thereafter, the recorded content can be 
send, to telephone 1, a message "This ID number is checked. 

invalid.**, and cancels the telephone call. Tape recorder 10 can be separated from the main 

If the ID number is valid, speech responder 7 sends to telephone body, and function as an independent unit, 
telephone 1, a message such as "This ID number is 45 More specifically, when the user returns home from 
valid, please input a tape recorder operation com- outside, he can hear the recorded content upon opera- 
mand." tion of tape recorder 10. The method of operating tape 

Controller 11 calculates the proper input level of recorder 10 is exactly the same as a conventional tape 
speech recognition unit 9 on the basis of peak value C of recorder. 

the DTMF signal (or an average value or an effective 50 In this manner, an ID number is input using a DTMF 
value) detected by level detector 6. Peak value C corre- signal which can be reliably discriminated. Upon recep- 
sponds to state of signal loss of active telephone line 2 tion of the DTMF signal, the signal loss state of the 
(that is, as the loss is larger, peak value C becomes telephone line, i.e. the reception level of the DTMF 
smaller). When the proper level is calculated in this signal, is detected. A control command of the tape re- 
manner, controller 11 adjusts the gain of PGA 8 so that 55 corder of the automatic answering telephone is input by 
a voice signal having the calculated proper level is input a voice from an outside telephone, and is recognized by 
to speech recognition unit 9. recognition unit 9. As a result, an automatic answering 

After the input level of speech recognition unit 9 is telephone which is very simple and easy to operate can 
adjusted, controller 11 enables speech recognition unit 9 be obtained. The voice level input to speech recognition 
to allow the voice signal to be input through telephone 60 unit 9 is adjusted to a proper value on the basis of the 
line 2, NCU 4, and PGA 8. The voice signal of the reception level of the DTMF signal (loss state of the 
operation command spoken at the handset of telephone telephone line), thus maintaining the input level of the 
1 is input to, and recognized by, speech recognition unit speech recognition unit in the proper state, correspond- 
9. The recognition result is supplied to controller 11. ing to the signal loss state of the telephone line. Thus, 

Note that the sound (voice) of an input operation 65 highly accurate speech recognition can always be as- 
command is a word, e.g., "RECORD'*, "PLAY- sured. 

BACK*', "STOP", "FAST FORWARD**, "RE- As described above, according to the present inven- 
WIND**, or the like. tion, there can be provided a voice-controlled apparatus 
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which can always maintain the input level to a speech of absence of the monitored signal envelope exceeds a 
recognition means in a proper state when the speech predetermined period of time, and 
recognition means is connected, and can assure highly said executing means includes means for activating 
accurate speech recognition. said recognizing means in response to the genera- 
Note that block 10 in FIG. 1 is not limited to a tape 5 tion of said discontinuity signal 
recorder. For example, when the apparatus shown in 5. A voice-controlled apparatus according to claim 1, 
FIG. 1 is equipped in an automatic teller machine in a wherein said detecting means includes means for detect- 
bank, a user can inquire about his or her balance using ing a peak level of said DTMF signal, 
an outdoor telephone. When block 10 is equipped in a A voice-controlled apparatus according to claim 1, 
computer-controlled automatic baker, a computer-con- 10 wherein said detecting means includes means for detect- 
trolled bath, or a computer-controlled VCR, a user can m 8 nns level of said DTMF signal, 
send an instruction (e.g., baking, bath preparation, and 7 - A voice-controlled apparatus according to claim 1, 
recording of a TV program by the VCR) to the appara- wherein said detecting means includes means for detect- 
tus shown in FIG. 1 by words, without using difficult m S ^ av e ra 8 e level of said DTMF signal, 
operation commands. 15 ^* A voice-controlled apparatus according to claim 1, 
Additional advantages and modifications will readily wherein said adjusting means includes a multi-gain am- 
occur to those skilled in the art. Therefore, the inven- P lifier havin « a plurality of selectable gain factors, each 
tion in its broader aspects is not limited to the specific « ain t™* ****** in accordance with the result 
details, representative devices, and illustrated examples of ^ ld S1 « naj level detection. 

shown and described herein. Accordingly, various 20 9 A voice-control method adapted to a telephone 

modifications may be made without departing from the sy * tem '. «?»Pp™g the steps of: 

spirit or scope of the general inventive concept as de- discriminating a content of a DTMF signal received 

fined by the appended claims and their equivalents. A f orm a teie P h ° ne hne of 1 * he f?l° n ** y *T> , 

What is claimed is* detecting a reception signal level of the DTMF signal 

1. A voice-controlled apparatus adapted to a tele- 25 J^^«J^ 

phone system, comprising. recognizing content of voice signal sent from the 

means for receiving and discriminating a content of a te £ jjone jj ne ° 

DTMF signal sent from a telephone line of the a djusXgTreception signal level of the voice signal in 

telephone system; 3Q acC0 rdance with the result of the DTMF signal 

me ™? ? e , 3 reCe ^ IOn S1 « nanevel of the level detection, so that an adjusted voice signal 

DTMF signal sent to said discriminating means to levd is ided which ensures a ni . 

provide a result indicative of the reception signal tion of the contem of ^ voice signal; and 

level of the DTMF signal; executing a specific function defined by the content 

means for recognizing content of voice signal sent 35 of said voice si , recognized by ^ cognizing 

from the telephone line; ste p 
means for adjusting a reception signal level of the ia A voice-controlled method according to claim 9, 
voice signal sent to said recognizing means in ac- wherein ^ executing step includes the steps of: 
cordance with the result of the reception signal recording and playing back voice information; and 
level of the DTMF signal detected by the detecting 40 cont rolling an operation of said recording/playing 
means, so that an adjusted voice signal level of the back step in accordance with the content of said 
voice signal is provided which is sufficient to en- voice signal recog nized by said recognizing step, 
sure a proper recognition of the content of said u A voice-controlled method according to claim 9, 
voice signal by the recognizing means; and wherein said executing step includes a step of control- 
means for executing a specific function defined by the 45 i; ng an operation of a computer-controlled equipment in 
content of said voice signal recognized by said accordance with the content of said voice signal recog- 
recognizing means. nized by ^ recognizing step. 

2. A voice-controlled apparatus according to claim 1, 12. a voice control method according to claim 9, 
wherein said executing means includes: wherein said recognizing step includes a step of moni- 

means for recording and playing back voice informa- 50 toring discontinuity of a signal envelope of the voice 

tion; and signal sent form the telephone line, and generating a 

means for controlling an operation of said recor- discontinuity signal for indication when a period of 

ding/playing back means in accordance with the absence of the monitored signal envelope exceeds a 

content of said voice signal recognized by said predetermined period of time; and 

recognizing means. 55 said executing step includes a step of activating said 

3. A voice-controlled apparatus according to claim 1, recognizing step in response to the generation of 
wherein said executing means includes: said discontinuity signal. 

a computer-controlled equipment; and 13. A voice-control method according to claim 9, 

means for controlling an operation of said computer- wherein said detecting step includes a step for detecting 

controlled equipment in accordance with the con- 60 a peak level of said DTMF signal, 

tent of said voice signal recognized by said recog- 14. A voice-control method according to claim 9, 

nizing means. wherein said detecting step includes a step for detecting 

4. A voice-controlled apparatus according to claim 1, an rms level of said DTMF signal. 

wherein said recognizing means includes means for 15. A voice-control method according to claim 9, 

monitoring discontinuity of a signal envelope of the 65 wherein said detecting step includes a step for detecting 

voice signal sent from the telephone line, and for gener- an average level of said DTMF signal, 

ating a discontinuity signal for indicating when a period * * * * * 
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